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Abstract 

The  advent  of  multimedia  calls  for  new  scheduling  paradigms  to  handle  the  combination  of  time-critical  and 
conventional  applications  present  on  many  multimedia  systems.  The  scheduler  of  the  Mach  3.0  Microkernel  has 
been  rewritten  to  allow  a  wide  spectrum  of  scheduling  policies,  from  real-time  through  time-sharing  to  background, 
to  be  selected  simultaneously  for  different  tasks  executing  on  the  same  processor.  Scheduling  policies  can  be  set  for 
a  task  or  for  individual  threads  within  the  task.  The  set  of  scheduling  policies  allowed  on  a  processor  or  set  of 
processors  may  be  dynamically  altered.  Scheduling  parameters  can  be  set  individually  for  each  thread,  task,  or 
scheduling  policy  enabled  on  a  processor.  Each  scheduling  policy  may  have  its  own  format  for  parameters;  they  are 
not  limited  to  integer  priorities.  New  scheduling  policies  may  be  configured  into  a  kernel  and  may  be  ordered  in  any 
way  desired.  The  resulting  system  provides  enough  flexibility  for  experimentation  with  new  scheduling  regimes,  yet 
is  efficient  enough  to  allow  a  reasonable  number  of  scheduling  policies  to  coexist.  When  configured  with  both 
real-time  and  timesharing  schedulers,  the  system  smoothly  supports  both  conventional  and  time-critical  applications. 

E-Mail:  dbg@cs.cmu.edu 


This  research  was  sponsored  by  the  Advanced  Research  Projects  Agency  under  contract  number  DABT63-93- 
C-0054.  The  views  and  conclusions  contained  in  this  document  are  those  of  the  author  and  should  not  be  interpreted 
as  representing  the  official  policies,  either  expressed  or  implied,  of  the  Advanced  Research  Projects  Agency  or  of 
the  U.S  government 


Keywords:  Operating  systems,  scheduling  policies,  real-time  scheduling,  Mach. 


1 


1.  Introduction 

General-purpose  and  real-time  computation  have  existed  in  two  separate  worlds.  General-purpose  operating 
systems  share  processors,  large  address  spaces,  and  file  systems  between  different  users,  but  do  not  guarantee 
performance.  Systems  that  can  meet  the  strict  timing  requirements  of  real-time  applications  have  often  been  written 
for  limited  hardware  and  are  customized  for  one  specific  purpose. 

However,  the  introduction  of  multimedia  into  business  applications  on  personal  computers  has  brought  the  two 
worlds  together.  Users  expect  smooth  video  display  to  coexist  with  fast  spreadsheet  calculation.  This  requires  an 
operating  system  that  can  support  both  real-time  and  conventional  applications,  without  compromising  either  one. 
Ideally,  the  resulting  system  should  allow  both  general  and  real-time  applications  to  make  full  use  of  the  hardware 

and  operating  system. 

Recent  research  on  real-time  scheduling  has  provided  a  firmer  theoretical  and  practical  basis  for  designing 
real-time  operating  systems.  A  considerable  contribution  to  this  area  has  come  from  the  Advanced  Real  Time 
Systems  group  (ARTS)  at  Carnegie  Mellon  University  [8],  Members  of  the  group  have  explored  the  rate-monotonic 
and  earliest-deadline-first  scheduling  algorithms  for  real-time  jobs.  Building  a  real-time  application  using  these 
scheduling  paradigms  requires  analyzing  the  computation  times  of  the  threads  in  the  application.  The  classical  time 
line  schedule,  in  contrast,  requires  determining  exactly  which  activities  are  executing  at  each  point  in  time. 

These  scheduling  solutions  have  been  incorporated  in  RT-Mach  [9],  a  version  of  the  Mach  kernel  incorporating 
real-time  features.  It  provides  selectable  scheduling  policies  suitable  for  real-time  applications,  taken  from  the 
Integrated  Time-Driven  Scheduler  [7]  written  for  the  ARTS  system.  The  interface  between  the  scheduling  policy 
modules  and  the  rest  of  the  kernel  is  written  so  that  new  scheduling  policies  can  easily  be  added  to  the  kernel. 

At  first,  scheduling  requirements  for  time-critical  applications  and  conventional  jobs  appear  completely 
incompatible.  The  primary  objective  of  a  real  time  scheduler  is  to  meet  the  timing  requirements  of  the  application, 
not  to  fairly  distribute  the  time  among  different  applications.  A  low-priority  job  may  be  delayed  indefinitely  by 
high-priority  jobs;  this  is  acceptable  and  expected.  However,  if  the  conventional  jobs  are  given  the  lowest  priority, 
lower  than  any  of  the  time-critical  tasks,  real-time  applications  will  automatically  be  given  preference,  and  both 
types  of  applications  can  run  simultaneously.  The  time  not  used  by  the  real-time  applications  is  split  fairly  among 
the  conventional  applications  by  the  timesharing  scheduler. 

The  RT-Mach  kernel  itself  falls  short  of  meeting  these  requirements.  Each  processor  is  restricted  to  running 
exactly  one  scheduling  policy  at  any  time.  Conventional  applications  must  therefore  be  shut  down  when  a  real-time 
application  is  to  be  run.  Since  some  conventional  applications  use  fixed  priority  and  time  sharing  scheduling 
simultaneously,  even  these  cannot  be  handled.  The  only  provision  for  running  applications  using  more  than  one 
scheduling  policy  is  to  write  a  customized  policy  module  that  combines  the  policies  and  imposes  an  ordering  on 
them.  Such  a  customized  solution  clearly  cannot  be  extended  beyond  a  very  small  set  of  policies. 


2.  New  Mach  Scheduling  Paradigm 

The  new  Mach  scheduling  paradigm  incorporates  the  real-time  scheduling  algorithms  from  RT-Mach,  but  places 
them  within  a  framework  which  allows  them  to  coexist  with  timesharing  scheduling  for  conventional  applications. 
The  scheduling  paradigm  imposes  an  ordering  on  the  applications’  threads,  depending  on  the  scheduling  policies  to 
which  they  are  assigned.  The  conventional  priority  value  used  to  schedule  threads  has  been  changed  to  a  two- 
component  value:  ( policy ,  ordering).  Threads  are  ordered  first  by  their  scheduling  policy:  time-critical  threads  have 
priority  over  all  non-time-critical  threads.  Within  each  scheduling  policy,  an  ordering  is  also  imposed.  Time-critical 
threads  may  be  ordered  by  their  deadlines;  non-time-critical  threads  are  ordered  by  an  arbitrary  priority  value. 
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As  in  a  conventional  scheduler,  the  next  thread  to  run  is  determined  by  the  priorities  of  all  the  currently  runnable 
threads.  Since  the  thread’s  scheduling  policy  is  the  first  component  of  its  priority  value,  any  thread  in  a  higher 
scheduling  policy  (for  example,  real-time)  is  automatically  of  higher  priority  than  any  thread  in  a  lower  scheduling 
policy  (for  example,  time-sharing).  Interaction  between  threads  in  different  scheduling  policies  is  resolved  by 
obeying  the  ordering  of  the  policies.  In  contrast  to  the  Real-Time  Mach  system,  it  is  unnecessary  for  any  one 
scheduling  policy  to  know  about  the  other  possible  policies  that  have  been  configured  in  order  to  provide  the  correct 
ordering. 

The  two-component  priority  value  also  allows  the  scheduler  interfaces  that  set  thread  priorities  to  be  generalized 
for  multiple  scheduling  policies.  Again,  the  priority  value  passed  to  the  interface  includes  both  scheduling  policy 
and  relative  ordering  within  that  policy.  The  format  of  the  ordering  component  has  been  made  flexible,  to  allow 
any  scheduling  policy  to  be  supported  by  these  uniform  interfaces. 


2.1.  Available  Scheduling  Policies 

The  Mach  kernel  is  normally  configured  with  the  following  scheduling  policies: 

•  Earliest  Deadline  First 

The  thread  with  the  earliest  deadline  (in  actual  time)  is  scheduled  to  run  first  This  policy  provides  for 
real-time  systems  where  threads  are  dynamically  added  and  removed. 

•  Rate  Monotonic 

This  handles  a  group  of  periodic  real-time  threads  whose  processor  usage  can  be  analyzed  in  advance. 
Threads  are  run  in  order  of  their  periods,  shortest  period  first  Non-periodic  threads  are  run  after  all 
periodic  threads,  in  no  particular  order. 

•  Fixed  Priority 

Each  thread  has  a  priority,  ranging  from  0  (highest)  to  31  (lowest).  The  runnable  thread  with  the 
highest  priority  value  is  chosen  to  run  first 

•  Time  Sharing 

Each  thread  is  assigned  a  priority  from  0  (highest)  to  31  (lowest).  In  addition,  a  priority  increment 
value  is  assigned,  depending  on  the  amount  of  processor  time  used.  The  increment  depends  on  a 
running  exponential  average  of  the  processor  time  used  during  the  last  second,  adjusted  for  the  total 
load  on  the  processor. 

•  Background 

Background  threads  are  run  only  if  no  other  threads  are  eligible  to  run. 


22.  External  Interfaces 

The  interfaces  to  the  priority  mechanism  have  been  generalized  to  support  selectable  scheduling  policies,  based 
on  older  Mach  interfaces  which  only  supported  timesharing  scheduling.  Priorities  appear  in  four  places  in  the  Mach 
kernel: 

•  Each  thread  has  a  base  priority  value.  For  most  scheduling  policies,  this  is  the  actual  priority  value;  for 
timesharing,  an  increment  based  on  CPU  usage  is  added  to  form  the  current  priority. 

•  Each  thread  also  has  a  maximum  priority  or  priority  limit.  An  unprivileged  user  cannot  set  the  thread’s 
priority  higher  than  its  maximum  priority. 

•  A  task  has  a  default  priority,  used  to  initialize  the  base  priority  of  newly  created  threads.  One  can  also 
change  the  priority  of  all  of  the  threads  in  a  task  by  changing  the  task’s  priority. 

•  A  processor  set  has  a  priority  limit  used  to  control  the  priorities  of  all  of  the  threads  running  on  that 
processor  or  group  of  processors. 
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Following  the  scheduling  paradigm,  all  of  these  priorities  are  now  two-component  values:  policy,  parameters. 
Each  policy  has  its  own  notion  of  parameters  for  base  priority  and  priority  limit.  For  the  timesharing  and  fixed- 
priority  policies,  these  are  still  numeric  values,  with  lower  values  denoting  higher  priorities.  For  the  real-time 
policies,  the  parameter  is  a  time  (the  thread’s  period  or  deadline).  A  corresponding  limit  value  is  a  lower  limit  on 
the  period  of  periodic  threads. 

The  task’s  default  priority  has  become  a  default  scheduling  policy  for  newly  created  threads  within  the  task.  In 
addition  to  supplying  the  base  priorities  for  all  new  threads,  this  allows  a  user  to  easily  specify  that  all  the  threads 
within  a  particular  task  run  under  the  same  policy  (for  example,  they  are  all  real-time  threads  scheduled  under  the 
rate-monotonic  policy). 

The  processor  set’s  data  has  been  expanded  to  hold  limit  values  for  each  scheduling  policy  currently  enabled. 
Scheduling  policies  may  be  freely  enabled  or  disabled  on  each  processor  set,  by  users  with  appropriate  privilege. 
On  a  multiprocessor,  a  user  may  force  all  time-sharing  threads  to  run  on  a  small  number  of  processors  by  grouping 
them  into  a  processor  set  and  enabling  timesharing  only  on  this  processor  set. 


The  Mach  kernel  interfaces  for  the  flexible  scheduling  policy  support  are: 


kem_retum_t  processor_set_policy_add  ( 
processor_set_t  pset, 

int  policy, 

policy_param_t  limit , 

unsigned  int  count ) 


Adds  the  new  scheduling  policy  policy  to  the  processor  set  pset,  with  the  scheduling  parameter  maximum  values 
set  by  limit. 

Jcem_return_t  processor_set_policy_remove  ( 
processor_set_t  pset, 

int  policy) 


Removes  the  scheduling  policy  policy  from  the  processor  set  pset.  Threads  running  policy  have  their  policy  reset 
to  timesharing,  or  to  background  if  timesharing  is  not  available  on  the  processor  set. 

The  background  policy  cannot  be  removed.  This  guarantees  that  a  processor  set  will  have  at  least  one  available 
scheduling  policy. 

kern_retum_t  processor_set_policy_limit  ( 
processor_set_t  pset, 

int  policy, 

policyjparam_t  limit, 

unsigned  int  count , 

boolean_t  change_threads ) 

Changes  the  parameter  limit  values  for  policy  on  processor  set  pset  to  the  new  limit  values.  These  limit  values 
will  affect  new  threads  that  are  created  as  assigned  to  processor  set  pset,  or  assigned  to  it. 


If  change  jhreads  is  TRUE,  any  thread  assigned  to  pset,  and  running  policy,  whose  scheduling  parameters  violate 
the  new  limit  values,  will  have  its  scheduling  parameters  reduced  to  limit.  Otherwise,  existing  threads  will  be 
unaffected.  This  allows  a  privileged  user  to  set  up  several  high-priority  threads,  then  change  the  limit  values, 
preventing  any  new  threads  from  being  assigned  as  high  a  priority  as  the  existing  threads. 
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kern_retum__t  thread_set_policy  ( 


thread_t  thread, 

processor_set_t  pset, 

int  policy, 

policy_param_t  param, 

unsigned  int  count) 


Sets  thread  to  run  the  scheduling  policy  policy,  with  its  scheduling  parameters  (priority,  period,  deadline,  etc.)  set 
to  param.  If  no  scheduling  parameters  are  supplied  ( count  is  zero),  the  default  values  for  the  policy  will  be  used. 

The  thread’s  processor  set,  pset,  must  be  supplied  as  a  privilege  key  to  allow  the  policy  to  be  selected;  if  pset  is 
not  the  processor  set  to  which  the  thread  is  currently  assigned,  the  call  will  fail. 

The  supplied  scheduling  parameters  must  be  less  than  or  equal  to  the  limit  values  for  pset  for  this  policy; 
otherwise  the  call  will  fail. 

kern_return_t  thread  set  policy  param  ( 

thread_t  thread, 

boolean_t  set_limit, 

policy_param_t  param, 

unsigned  int  count) 


Sets  new  scheduling  parameter  values  for  thread's  current  scheduling  policy.  If  set  limit  is  true,  this  call  also  sets 
the  thread’s  parameter  limit  value.  The  parameters  must  be  valid  for  the  current  scheduling  policy,  and  less  than  or 
equal  to  the  limit  values  for  the  thread’s  current  processor  set  for  this  policy,  and  less  than  or  equal  to  the  thread’s 
limit  values.  This  allows  a  privileged  user  to  set  a  maximum  priority  (or  minimum  period)  for  a  thread,  and  let  an 
unprivileged  user  vary  the  thread’s  priority  up  to  the  maximum  priority,  but  no  higher. 

kern_return_t  thread_set_policy_limit 
thread_t  thread, 

processor_set_t  pset, 

policy_param_t  limit, 

unsigned  int  count) 


Sets  new  scheduling  parameter  limit  values  for  thread's  current  scheduling  policy.  The  thread’s  current  processor 
set,  pset,  must  be  supplied  as  a  privilege  key;  if  this  is  not  the  processor  set  to  which  the  thread  is  currently  assigned, 
the  call  will  fail. 


kera_retum_t  task_set_def  ault_policy  ( 
task_t  task, 

processor_set_t  pset, 

int  policy, 

policy_jparam_t  param, 

unsigned  int  count, 

boolean_t  change_threads ) 


Sets  the  default  scheduling  policy  for  task  to  be  policy,  and  the  default  scheduling  parameters  for  that  policy  to  be 
param.  The  default  policy  and  parameters  are  assigned  to  new  threads  created  within  the  task. 

If  change  Jhreads  is  true,  all  existing  threads  within  the  task  are  changed  to  run  policy  with  the  new  scheduling 
parameters.  If  any  thread  cannot  be  made  to  run  policy  with  these  parameters,  the  call  will  return  KERN_FAILURE. 
This  may  happen  if  a  thread  is  assigned  to  a  processor  set  that  does  not  run  policy,  or  if  the  scheduling  parameters 
exceed  the  limit  values  for  a  thread  or  its  assigned  processor  set.  If  the  call  returns  KERN_FAILURE,  as  many 
threads  as  possible  will  be  running  the  new  policy,  but  there  will  be  no  indication  of  which  threads  are  not 
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The  task’s  processor  set,  pset,  must  be  supplied  as  a  privilege  key  to  allow  the  policy  to  be  selected;  if  pset  is  not 
the  processor  set  to  which  the  task  is  cuirendy  assigned,  the  call  will  fail. 

3.  Internal  Reorganization 

To  allow  new  scheduling  policies  to  be  easily  added  to  the  Mach  kernel,  the  routines  that  manage  the  queue  of 
runnable  threads  have  been  split  into  two  distinct  modules,  connected  by  a  well-defined  interface.  The  run_queue 
module  supplies  the  abstraction  of  a  queue  of  runnable  threads  to  the  rest  of  the  scheduler.  Routines  are  provided  to 
add  a  thread  to  the  run  queue,  to  choose  the  highest  priority  thread  from  the  run  queue  and  remove  it,  and  to  remove 
a  thread  from  the  run  queues  to  adjust  its  priority  or  terminate  it.  Each  of  these  routines  calls  a  corresponding 
routine  in  the  thread’s  scheduling  policy  to  implement  the  actual  run  queue  structure. 


3.1.  Run  Queue  Module 

Fsrh  processor  set  has  a  run  queue  header  that  holds  pointers  to  the  run  queues  for  each  policy  enabled  on  that 
processor  set.  There  is  a  common  lock  for  all  of  the  run  queues,  a  count  of  the  total  number  of  runnable  threads,  and 
a  bit  map  of  the  run  queues  that  contain  runnable  threads. 

The  thread  setrun  routine  adds  a  runnable  thread  to  the  run  queues.  It  calls  the  enqueuing  routine  for  the  thread’s 
scheduling  policy  to  add  the  thread  to  its  policy’s  run  queue,  and  sets  the  bit  in  the  bit  map  to  indicate  that  this 
policy  has  runnable  threads. 

The  remrunq  routine  removes  a  runnable  thread  from  the  run  queues.  It  calls  the  remove-from-queue  routine  to 
remove  the  thread  from  its  policy’s  run  queue,  and  clears  the  bit  in  the  bitmap  if  the  policy’s  run  queue  is  now 
empty. 

The  processor  searches  for  a  new  thread  to  run  by  calling  thread  select.  This  routine  scans  the  bit  map  for  the 
highest  priority  scheduling  policy  that  contains  runnable  threads.  It  then  calls  the  dequeueing  routine  for  that  run 
queue’s  scheduling  policy,  which  removes  and  returns  the  highest  priority  thread  for  that  policy.  The  scan  of  the  bit 
map  makes  finding  the  highest-priority  scheduling  policy  very  fast.  As  in  rem  runq,  the  policy’s  bit  in  the  bitmap  is 
cleared  if  the  policy’s  run  queue  has  no  more  runnable  threads. 

At  clock  interrupts,  routine  csw  needed  is  called  to  check  the  run  queues  to  determine  whether  the  current  thread 
should  be  preempted  by  a  higher  priority  thread.  Again,  this  is  done  by  scanning  the  bit  map  of  policies  with 
runnable  threads.  However,  the  search  stops  at  the  policy  belonging  to  the  current  thread.  Clearly,  the  current 
thread  cannot  be  preempted  by  a  thread  in  a  lower  scheduling  class;  similarly,  it  will  always  be  preempted  by  a 
thread  in  a  higher  scheduling  class.  If  the  first  non-empty  run  queue  is  the  current  thread’s  run  queue,  a  policy- 
specific  routine  is  called  to  compare  the  current  thread’s  priority  (ordering)  with  the  highest-priority  thread  on  that 
run  queue.  If  the  routine  indicates  that  the  thread  on  the  run  queue  has  higher  priority,  the  current  thread  will  be 
preempted. 


32.  Scheduling  Policies 

The  run  queue  module  and  the  Mach  external  interfaces  call  the  individual  scheduling  policies  through  a  well- 
defined  interface.  The  scheduling  policy  is  structured  as  an  object  A  policy  that  is  active  on  a  processor  set  has  its 
own  set  of  run  queues  and  other  local  data  structures  that  belong  to  that  instance  of  the  policy.  The  policy  holds  a 
pointer  to  the  set  of  interface  routines  that  are  called  for  all  operations  on  run  queues,  tasks,  and  threads  that  run  the 
policy. 
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The  local  run  queue  structure  contains  a  common  header  that  holds  a  count  of  the  number  of  runnable  threads  on 
the  queue  and  a  pointer  to  the  function  vector  for  the  scheduling  policy.  Following  the  header  is  a  policy-specific 
structure  that  includes  the  priority  limit  value  for  the  instance  of  the  policy,  and  the  list  of  runnable  threads.  Placing 
the  thread  list  in  the  policy-specific  portion  allows  it  to  be  tailored  to  the  policy.  For  example,  the  timesharing  and 
fixed-priority  policies  use  an  array  of  32  queues,  one  per  priority  level.  In  contrast,  the  real-time  policies  use  a 
single  ready  queue,  on  which  threads  are  queued  in  order  of  deadline  time  or  period. 

The  operations  are  listed  here,  along  with  the  kernel  routines  that  call  them: 

Run  queue  manipulation: 

thread_t  (*thread_dequeue) ( 

run_queue_t  runq) 

Called  by  thread  jelect  to  remove  the  highest  priority  thread  from  runq ,  and  return  it.  The  run  queue  is  known  to 
be  non-empty. 

boolean_t  (*thread_enqueue) ( 

run_queue_t  runq, 

thread_t  thread, 

boolean_t  mayjpreexnpt ) 

Called  by  thread_setrun  to  enqueue  thread  on  runq,  in  order  by  priority.  The  queuing  policy  is  invisible  outside 
the  policy  module  itself. 

If  may _preempt  is  TRUE,  the  routine  returns  whether  thread  is  of  higher  priority  (according  to  the  policy)  than 
the  currently  executing  thread.  Otherwise  the  routine  returns  FALSE. 

void  ( * thread_remqueue ) ( 

run_queue_t  runq, 

thread_t  thread) 

Removes  thread  from  runq.  The  thread  is  known  to  be  on  the  run  queue. 

Called  by  rem  runq  (from  thread_hold  and  threadjdowait )  to  suspend  a  thread  that  is  on  the  run  queues.  Also 
called  by  routines  that  change  the  thread’s  policy  or  scheduling  parameters.  If  the  thread  is  runnable,  these  routines 
remove  it  from  its  current  run  queue,  alter  its  policy  or  parameters,  and  replace  it  on  the  (possibly  new)  run  queue  at 
its  new  position. 

Thread  priority  update  and  context  switch  checks: 

boolean_t  (*csw_needed) ( 

run_queue_t  runq, 

thread_t  thread) 

Called  by  cswjieeded  from  clock-interrupt  code  to  determine  whether  thread  (the  currently  running  thread) 
should  be  preempted  by  the  highest-priority  thread  in  runq.  Returns  TRUE  if  this  is  so.  Thread  is  known  to  run 
runq's  scheduling  policy. 

void  (*clock_sched) ( 

thread_t  thread, 

boolean_t  end_of_quantum) ; 

Called  by  the  system  clock  interrupt  to  perform  random  scheduling  decisions  on  thread.  End_of_quantum  is 
TRUE  if  the  clock  interrupt  marks  the  end  of  the  current  quantum.  Thread  is  the  currently  running  thread. 
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This  is  used  by  the  timesharing  policy  to  update  thread's  CPU  usage  and  current  priority.  For  other  policies,  this 
checks  whether  there  is  any  other  reason  to  preempt  thread. 

void  (*update_priority)  ( 

thread_t  thread) 

Called  by  thread jetrun  to  perform  random  scheduling  decisions  on  thread.  This  differs  from  clock_sched  in  that 
the  thread  is  just  being  made  runnable. 

This  is  used  by  the  timesharing  policy  to  update  thread’s  CPU  usage  and  current  priority  when  thread  has  not  run 
for  a  period  of  time. 

Processor  set  priority  limits: 

kem_retum_t  (*runq_set_limit)  ( 

run_queue_t  runq, 

policy_param_t  limit , 
natural_t  count) 

Sets  limitfcount]  as  the  limit  values  for  the  scheduling  policy  on  a  processor  set.  Runq  is  the  run  queue  for  the 
policy  on  the  processor  set 

lcem_retum_t  ( *  runq  get  limit )  ( 

run_queue_t  runq, 

policy_param_t  limit, 
natural_t  * count) 

Returns  the  the  limit  values  for  the  scheduling  policy  on  a  processor  set  in  limit[countJ.  Runq  is  the  run  queue  for 
the  policy  on  the  processor  set. 

Thread  priority  values  and  limits: 

kem_retum_t  (*thread_set_limit)  ( 

thread_t  thread, 

policy_param_t  limit, 
natural_t  count) 

Sets  limit  [count]  as  thread's  limit  values  for  the  scheduling  policy. 

ke m_retum_t  (*thread_set__param)  ( 

thread__t  thread, 

policy_param_t  param, 
natural_t  param_count , 

boolean_t  new_policy, 

boolean_t  check_limits) 

Sets  param[ count]  as  the  scheduling  policy  parameters  for  thread.  If  param  is  NULL,  thread’s  current 
parameters  are  not  changed  if  new _policy  is  FALSE  (thread  is  running  the  policy);  otherwise,  the  thread’s 
parameters  are  set  to  the  policy-specific  default  values. 

The  policy  parameters  (supplied,  current,  or  default)  are  checked  against  the  limit  values  for  the  policy. 
Parameters  that  are  invalid  for  the  policy  or  that  exceed  the  limit  values  result  in  an  error  if  check  limits  is  TRUE; 
in  this  case,  the  thread’s  policy  parameters  are  not  changed.  Otherwise,  the  policy  parameters  are  silently  set  to  the 
limit  values. 
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If  new _policy  is  TRUE,  thread's  limit  values  are  used  for  the  check.  Otherwise,  the  limit  values  are  taken  from 

the  processor  set  to  which  thread  is  assigned. 

kem_retum_t  (*thread_get_param)  ( 
thread_t  thread, 

policy_param_t  param, 
natural_t  * count) 

Returns  thread’s  policy  parameters  and  limit  values  for  the  scheduling  policy  in  param[ count] . 

Task  priority  values: 

kem_retum_t  (*task_set_param)  ( 
task_t  task, 

policy_param_t  param, 
natural_t  count) 

Sets  param[ count]  as  the  default  policy  parameters  for  task. 

kem_retum_t  (*task  get  param)  ( 
task_t  task, 

policy_param_t  param, 
natural_t  *count) 

Returns  task’s  default  policy  parameters  for  the  scheduling  policy  in  param[count] . 


4.  Evaluation 

The  scheduling  policy  framework  has  been  used  to  implement  fixed,  high  priorities  for  server  threads  in  the  single 
server  Unix  emulation  under  Mach  [2],  User  threads  run  under  the  time-sharing  scheduling  policy.  Server  threads 
use  the  fixed  priority  policy,  but  may  temporarily  run  as  background  threads  while  waiting  for  synchronization 
variables  [1].  Although  the  indirect  procedure  calls  to  the  scheduler  policy  modules  are  in  the  critical  context  switch 
and  system  clock  interrupt  paths,  measurements  on  compilation  benchmarks  show  that  they  have  no  effect  on  system 
performance.  Part  of  this  result  is  due  to  a  reorganization  of  the  scheduler  code  that  was  done  concurrently  with  the 
other  work,  reducing  the  aggregate  path  lengths  in  spite  of  the  extra  functionality. 

Results  from  the  Real-Time  Mach  group  show  that  switching  from  timesharing  to  real-time  scheduling  policies 
does,  in  fact,  lead  to  correct  behavior  for  a  time-constrained  application.  Tokuda  [9]  evaluates  the  visual 
performance  of  a  program  that  simulates  the  rotation  of  a  molecule  by  using  individual  threads  to  move  each  atom  to 
its  new  position.  Using  time-sharing  scheduling,  the  atoms  drift  apart  and  the  molecule  loses  its  shape;  under  the 
rate-monotonic  scheduling  policy,  the  atoms  in  the  molecule  correctly  move  in  synchrony.  Similar  test  programs 
running  under  the  Mach  kernel  with  this  scheduling  policy  framework,  and  using  rate  monotonic  scheduling,  show 
identical  behavior. 

A  truer  test  of  the  scheduling  framework  is  to  run  real-time  and  time-sharing  threads  simultaneously.  When  both 
the  test  programs  and  a  time-sharing  application  are  running,  the  test  program  still  shows  its  desired  synchronous 
behavior,  up  to  the  point  where  the  kernel  starts  paging  to  disk.  Above  this  point,  of  course,  there  is  no  guaranteed 
behavior,  since  the  real-time  application  is  not  wired  into  memory. 

The  scheduler  framework  allows  the  easy  addition  of  new  scheduling  policies,  such  as  Mercer’s  processor 
capacity  reservation  scheme  [3],  To  install  a  new  policy,  only  the  routines  that  queue  threads  according  to  the 
policy’s  priority  scheme  need  to  be  written.  The  surrounding  framework  manages  the  interactions  between  threads 
running  different  policies,  and  distributes  threads  to  CPUs  in  a  multiprocessor  system.  The  background  policy,  for 
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example,  was  added  to  the  system  with  about  an  hour’s  work. 


5.  Related  Work 

Other  operating  systems  have  incorporated  both  real-time  and  conventional  processing.  The  CP-V  operating 
system  for  the  SDS  Sigma  7  [5]  supported  both  conventional  (batch)  and  real-time  processes  via  a  common  run 
queue  structure.  Each  process  was  assigned  a  base  priority,  with  higher  (numeric)  values  giving  lower  priority. 
Conventional  processes  had  positive  priority  values;  the  priorities  were  varied  according  to  the  process’  CPU  usage, 
as  in  the  Mach  timesharing  scheduling  policy.  Real-time  processes  had  negative  priorities  that  were  not  varied.  A 
negative  priority  also  prevented  a  process’  memory  from  being  swapped  to  disk. 

SunSoft’s  Solaris  2.1  operating  system  [6]  similarly  allows  processes  to  be  designated  as  "real-time."  Real-time 
processes,  as  in  CP-V,  are  given  fixed  priority,  and  preempt  timesharing  threads.  They  are  also  exempt  from  the 
usage  and  load-dependent  priority  adjustments  for  timesharing  threads. 

The  actual  real-time  scheduling  policies  provided  by  these  two  systems  include  only  fixed-priority,  round-robin 
scheduling.  The  application  designer  must  carefully  calculate  the  sequence  of  execution  for  the  set  of  real-time 
processes,  and  vary  priorities  at  appropriate  points  during  the  application  in  order  to  ensure  timely  response.  The 
rate-monotonic  and  earliest-deadline-first  policies  included  in  the  Mach  kernel,  in  contrast,  automatically  vary 
process  priorities  to  guarantee  completion;  the  designer  merely  has  to  analyze  the  application’s  total  computational 
load,  not  the  exact  mix  of  processes  executing  at  each  point. 

The  Mach/RT  project  [4],  like  RT-Mach,  extends  the  Mach  microkernel  to  incorporate  real-time  scheduling 
policies.  Scheduling  policies  are  separated  into  individual  modules  that  may  be  configured  into  a  kernel;  the 
description  places  more  emphasis  on  the  configurable  scheduler  interface  than  on  the  scheduling  policies 
themselves.  As  in  RT-Mach,  only  one  scheduling  policy  may  be  in  effect  on  a  processor  at  any  one  time.  The  only 
way  to  share  a  machine  between  conventional  and  time-critical  tasks  is  to  write  a  customized  scheduling  policy.  In 
contrast,  the  rewritten  Mach  scheduler  automatically  dispatches  both  classes  of  threads  in  the  correct  order. 

The  closest  commercial  scheduler  equivalent  to  Mach’s  multiple-scheduler  framework  is  IBM’s  OS/2  scheduler. 
Threads  may  be  assigned  to  one  of  four  ordered  scheduling  classes:  real-time,  fixed  priority,  timesharing,  and 
background.  Each  scheduling  class  has  32  priority  levels.  However,  the  implementation  does  not  provide  for 
adding  more  scheduling  classes:  threads  are  queued  onto  one  of  128  queues,  and  the  scheduler  looks  through  all  of 
those  queues  to  find  the  next  runnable  thread.  Adding  a  true  rate-monotonic  scheduler  requires  not  only  that  more 
run  queues  be  added,  but  that  each  thread’s  rate,  measured  in  microseconds,  be  reduced  to  one  of  the  small  set  of 
added  queues.  This  may  result  in  priority  inversion  among  the  threads  assigned  to  the  same  run  queue. 

6.  Conclusion 

This  paper  has  presented  a  scheduling  paradigm  that  handles  the  needs  of  multimedia  systems  running  both 
real-time  and  conventional  applications.  This  scheduler  framework  has  been  implemented  in  a  version  of  the  Mach 
3.0  microkernel.  It  expands  on  previous  work  in  real-time  scheduling  to  support  modem  real-time  scheduling 
algorithms,  allow  experimentation  with  new  scheduling  policies,  and  enable  timesharing  and  background  jobs  to 
coexist  with  real-time  applications. 
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